Generalized Hierarchical Word Sequence Framework for Language Modeling

نویسندگان
چکیده

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Generalized Framework for Hierarchical Word Sequence Language Model

Language modeling is a fundamental research problem that has wide application for many NLP tasks. For estimating probabilities of natural language sentences,most research on language modeling use n-gram based approaches to factor sentence probabilities. However, the assumption under n-grammodels is not robust enough to cope with the data sparseness problem, which affects the final performance o...

متن کامل

A Hierarchical Word Sequence Language Model

Most language models used for natural language processing are continuous. However, the assumption of such kind of models is too simple to cope with data sparsity problem. Although many useful smoothing techniques are developed to estimate these unseen sequences, it is still important to make full use of contextual information in training data. In this paper, we propose a hierarchical word seque...

متن کامل

An Improved Hierarchical Word Sequence Language Model Using Directional Information

For relieving data sparsity problem, Hierarchical Word Sequence (abbreviated as HWS) language model, which uses word frequency information to convert raw sentences into special n-gram sequences, can be viewed as an effective alternative to normal n-gram method. In this paper, we use directional information to make HWS models more syntactically appropriate so that higher performance can be achie...

متن کامل

Bayesian Unsupervised Word Segmentation with Hierarchical Language Modeling

This paper proposes a novel unsupervised morphological analyzer of arbitrary language that does not need any supervised segmentation nor dictionary. Assuming a string as the output from a nonparametric Bayesian hierarchical n-gram language model of words and characters, “words” are iteratively estimated during inference by a combination of MCMC and an efficient dynamic programming. This model c...

متن کامل

Hierarchical Character-Word Models for Language Identification

Social media messages’ brevity and unconventional spelling pose a challenge to language identification. We introduce a hierarchical model that learns character and contextualized word-level representations for language identification. Our method performs well against strong baselines, and can also reveal code-switching.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Journal of Natural Language Processing

سال: 2017

ISSN: 1340-7619,2185-8314

DOI: 10.5715/jnlp.24.395